1.Description of the problem and data

In this challenge we need to use generative adversarial network (GAN)to create new image instance of dogs. The input data contains 20,580 images of 120 categories of breed.

2. Exploratory Data Analysis (EDA) — Inspect, Visualize and Clean the Data

Step 3 Model Architecture

Describe your model architecture and reasoning for why you believe that specific architecture would be suitable for this problem.

  1. Convolutional Autoencoders
    • two parts: decoder and encoder, encoder compress image into laten representation in lowe dim, in our case 64 by 64 into 8 by 8.
    • decoder is like a balance but oppose architecture as encoder. It tranform laten dim into fake image.
    • The size of laten dim will affect the result quite significant, if laten dim is 16 or 32 instead of 8, the fake image quality is much more worse.
  1. Generative adversarial networks:
    • generator: input random noise(gaussian noise) output fake images. Input just like coding or laten representation of image, discriminator is a binary classifier to tell weather input image is fake or real.
    • discriminator: same as a binary classifer, train to tell is input are fake or real image.
    • Some conventional setting in architecture are used: replace pooling layer to conv2d with larger strid, use batch norm in both G and D; output layer of generator is tanh because we want output range from -1 to 1, use leaky relu as activaion.

Plot the fake images by autoencoder.

Deep Convolutional GAN

Step 4 Results and Analysis

Run hyperparameter tuning, try different architectures for comparison, apply techniques to improve training or performance, and discuss what helped.

  1. Autoencoder: trainin process is fast, about 60s per epoch, generate fake image a little fuzzy.

  2. GAN Training need much more time in GAN model, even in this basic gan model with 3 layers in D and 3 layers in G. But, the image sometime is better than autoencoder with complex gan.

    • Some difficult when training gan: the generator will become less diverse and generate image focuse on few classes, and don't know how to generate new images in other class.
  3. Quality of result: can compare MiFID metric, (https://www.kaggle.com/code/wendykan/demo-mifid-metric-for-dog-image-generation-comp/notebook). Or simply display the fake image and visual by eyes.

example output 1 form autoencoder:

image.png

example 2:

image.png

Step 5 Conclusion

Discuss and interpret results as well as learnings and takeaways. What did and did not help improve the performance of your models? What improvements could you try in the future?

Reference:

https://www.kaggle.com/code/jesucristo/gan-introduction/notebook

https://www.kaggle.com/code/wendykan/gan-dogs-starter/notebook

https://www.kaggle.com/code/roydatascience/introduction-to-generative-adversarial-networks/notebook

https://www.kaggle.com/code/cdeotte/dog-autoencoder/notebook